Optimal state dependent spectral representation for HMM modeling : a new theoretical framework

نویسندگان

  • Chafic Mokbel
  • Guillaume Gravier
  • Gérard Chollet
چکیده

In this paper we propose a theoretical framework to extend classical continuous density HMM in order to consider different spectral representations depending on the state. We stress the need for a reference space and for spectral transformations between the model spectral representation spaces and the reference space. We show that this framework permits to obtain more precise pdfs in the reference space. Preliminary speech recognition experiments for two spectral representations MFCC and linear frequency scale cepstral coefficients show no improvements ; however they identify that the choice of the spectral representations is crucial and the determination of the spaces transformations is a complex problem 2. THEORETICAL FRAMEWORK 2.1 Model Definition Classically an HMM λ, with Q states and Gaussian sub-processes pdfs, is defined using the parameters : • π = {πi} i=1,...,Q : the probabilities of occupying the state i at the first instant. • A = {aij}i,j=1,...,Q : the transition probabilities from state i to state j. • B = {bi(Xτ)} i=1,...,Q : the Gaussian subprocesses associated with the different states of the model. Each Gaussian distribution has two sets of parameters : its mean μi and its covariance matrix Γi. Here, we propose to associate a given spectral representation identified by its index α with each state in the model. We assume that the different feature vectors belongs to R. This assumption is necessary in order to perform ‘‘Maximum Likelihood’’ training and classification. In such case the sub-processes pdfs are defined : • B = {bi(Xτ,αi)} i = 1,...,Q and αi ∈ {1, ..., M} : where each distribution is identified by the spectral representation index in addition to the classical mean vector and covariance matrix. However, as defined, the new HMM cannot be trained nor used for classification since the observed outputs do not belong to a predefined and unique space, even if the different representation spaces have the same dimension p. Actually, the observation spectral representation depends on the state of the HMM which is hidden and not observed. This means that the spectral representation of frame may vary for several word hypotheses and within a single hypothesis during the training, making difficult to tie the measured likelihood to the probability. A solution to this problem consists in defining a reference space with a spectral representation α̂ (MFCC for example). Considering that a function T α/α̂ permits to match the space X α on to the space X α̂ : X τ,α̂ = T α/α̂(X τ,α) (1) then the density for a state i can be written : bi(X τ,α̂) = bi(X τ,αi) / ||J(X τ,αi)|| (2) where J(X τ,αi) is the Jacobian matrix whose (k,l) element is : Jk,l(X τ,α) = ∂T α/α̂(X τ,α)k ∂X τ,α l (3) Using the Eq. (2) and (3) the observations of the new defined HMM belong to a unique space X α̂. A very important question appears at this stage : • In what the new model differs from an HMM completely defined in the space X α̂? Actually, the new model differs from the X α̂ model in the form of its sub-processes distributions. It may be more reliable to approximate with a Gaussian the distribution of the data in a space X α for a given state, than to approximate with a Gaussian the distribution of the same data in the reference space X α̂. Thus, the data in X α̂ may be distributed following a more complex and precise probability density function depending on the space transformation. Hence, through the Jacobian matrices, the corresponding distribution in the reference space would be more precise. Moreover, more precise distributions lead generally to a more precise and robust model. Nevertheless, the estimation of these Jacobian matrices remains a major problem as we will see in the following. 2.2 Computation of the Space Transformation function The determination of the transformation T α/α̂ that allows to match a given space on to the reference space is not always obvious. Thus, one can approximate this transformation by a simpler function chosen for some mathematical attractiveness such as a regression matrix, a LMR, a linear quadratic function, a Volterra function, etc. Once defined the parameters of these functions must be estimated. This can be done using classical criteria such as ‘‘Minimum Mean Square Error’’ (MMSE). We can notice that the transformation parameters estimation may be easier here since there is no alignment problem since the different analysis techniques are derived for the same frames of signal. 2.3 Training Algorithm The parameters of the new model may be trained using the classical EM algorithm. In the ‘‘Estimate’’ step the auxiliary function relative to a given state is computed for the M possible spectral representations. During the ‘‘Maximize’’ step, in order to maximize the auxiliary function, the mean and covariance matrices would be determined for all the spectral representations, and for a given distribution the optimal spectral representation is chosen following : αopt=argmax α ∑ τ [-1/2log||Γi,α|| log||J(X τ,α)||] (4) Looking to this equation it appears clearly that the Jacobian impact is crucial to compensate for decreasing the variations in the state. 3. PARTICULAR APPLICATION : VARIABLE RESOLUTION CEPSTRAL COEFFICIENTS Different spectral representations may be obtained by varying the spectral resolution of the filter-bank used to compute the MFCC coefficients. In [6], a computational method using FFT is proposed for spectra with nonuniform resolution on the frequency scale. A procedure for implementing a frequency warping all-pass transformation, known as bilinear transformation, estimates the AR parameters (of order p) using the outputs of p successive filters. These dephasing filters are preceded by a corrective filter in order to obtain an all-pass filter. If first order dephasing filters are considered then, the frequency transformation is expressed by the following relation in the normalized frequency domain :

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Fast and Efficient HMM-Based Face Recognition System Using a 7-State HMM Along With SVD Coefficients

In this paper, a new Hidden Markov Model (HMM)-based face recognition system is proposed. As a novel point despite of five-state HMM used in pervious researches, we used 7-state HMM to cover more details. Indeed we add two new face regions, eyebrows and chin, to the model. As another novel point, we used a small number of quantized Singular Values Decomposition (SVD) coefficients as feature...

متن کامل

A New Dictionary Construction Method in Sparse Representation Techniques for Target Detection in Hyperspectral Imagery

Hyperspectral data in Remote Sensing which have been gathered with efficient spectral resolution (about 10 nanometer) contain a plethora of spectral bands (roughly 200 bands). Since precious information about the spectral features of target materials can be extracted from these data, they have been used exclusively in hyperspectral target detection. One of the problem associated with the detect...

متن کامل

Robust speech recognition and feature extraction using HMM2

This paper presents the theoretical basis and preliminary experimental results of a new HMM model, referred to as HMM2, which can be considered as a mixture of HMMs. In this new model, the emission probabilities of the temporal (primary) HMM are estimated through secondary, state specific, HMMs working in the acoustic feature space. Thus, while the primary HMM is performing the usual time warpi...

متن کامل

Advances in Spectral Parameterization for Statistical (HMM-Based) TTS

HMM-based parametric speech synthesis has recently become an alternative to the concatenative TTS approach, especially when low footprint and general speech domain are required. A majority of speech parameterization models used in state-ofthe art HMM TTS systems employ source-filter waveform synthesis schemes. Sinusoidal representation and waveform generation of speech is an alternative to the ...

متن کامل

Equivalent a posteriori error estimates for spectral element solutions of constrained optimal control problem in one dimension

‎In this paper‎, ‎we study spectral element approximation for a constrained‎ ‎optimal control problem in one dimension‎. ‎The equivalent a posteriori error estimators are derived for‎ ‎the control‎, ‎the state and the adjoint state approximation‎. ‎Such estimators can be used to‎ ‎construct adaptive spectral elements for the control problems.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997